151 research outputs found

    Parallel Mapper

    Full text link
    The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint trees. In this paper, we study the parallel analysis of the construction of Mapper. We give a provably correct parallel algorithm to execute Mapper on multiple processors and discuss the performance results that compare our approach to a reference sequential Mapper implementation. We report the performance experiments that demonstrate the efficiency of our method

    MaxMin Linear Initialization for Fuzzy C-Means

    Get PDF
    International audienceClustering is an extensive research area in data science. The aim of clustering is to discover groups and to identify interesting patterns in datasets. Crisp (hard) clustering considers that each data point belongs to one and only one cluster. However, it is inadequate as some data points may belong to several clusters, as is the case in text categorization. Thus, we need more flexible clustering. Fuzzy clustering methods, where each data point can belong to several clusters, are an interesting alternative. Yet, seeding iterative fuzzy algorithms to achieve high quality clustering is an issue. In this paper, we propose a new linear and efficient initialization algorithm MaxMin Linear to deal with this problem. Then, we validate our theoretical results through extensive experiments on a variety of numerical real-world and artificial datasets. We also test several validity indices, including a new validity index that we propose, Transformed Standardized Fuzzy Difference (TSFD)

    Contextual and Behavioral Customer Journey Discovery Using a Genetic Approach

    Get PDF
    With the advent of new technologies and the increase in customers’ expectations, services are becoming more complex. This complexity calls for new methods to understand, analyze, and improve service delivery. Summarizing customers’ experience using representative journeys that are displayed on a Customer Journey Map (CJM) is one of these techniques. We propose a genetic algorithm that automatically builds a CJM from raw customer experience recorded in a database. Mining representative journeys can be seen a clustering task where both the sequence of activities and some contextual data (e.g., demographics) are considered when measuring the similarity between journeys. We show that our genetic approach outperforms traditional ways of handling this clustering task. Moreover, we apply our algorithm on a real dataset to highlight the benefit of using a genetic approach

    How (not) to measure bias in face recognition networks

    Get PDF
    Within the last years Face Recognition (FR) systems have achieved human-like (or better) performance, leading to extensive deployment in large-scale practical settings. Yet, especially for sensible domains such as FR we expect algorithms to work equally well for everyone, regardless of somebody's age, gender, skin colour and/or origin. In this paper, we investigate a methodology to quantify the amount of bias in a trained Convolutional Neural Network (CNN) model for FR that is not only intuitively appealing, but also has already been used in the literature to argue for certain debiasing methods. It works by measuring the "blindness" of the model towards certain face characteristics in the embeddings of faces based on internal cluster validation measures. We conduct experiments on three openly available FR models to determine their bias regarding race, gender and age, and validate the computed scores by comparing their predictions against the actual drop in face recognition performance for minority cases. Interestingly, we could not link a crisp clustering in the embedding space to a strong bias in recognition rates|it is rather the opposite. We therefore offer arguments for the reasons behind this observation and argue for the need of a less naive clustering approach to develop a working measure for bias in FR models

    Classification of frequency response areas in the inferior colliculus reveals continua not discrete classes

    Get PDF
    A differential response to sound frequency is a fundamental property of auditory neurons. Frequency analysis in the cochlea gives rise to V-shaped tuning functions in auditory nerve fibres, but by the level of the inferior colliculus (IC), the midbrain nucleus of the auditory pathway, neuronal receptive fields display diverse shapes that reflect the interplay of excitation and inhibition. The origin and nature of these frequency receptive field types is still open to question. One proposed hypothesis is that the frequency response class of any given neuron in the IC is predominantly inherited from one of three major afferent pathways projecting to the IC, giving rise to three distinct receptive field classes. Here, we applied subjective classification, principal component analysis, cluster analysis, and other objective statistical measures, to a large population (2826) of frequency response areas from single neurons recorded in the IC of the anaesthetised guinea pig. Subjectively, we recognised seven frequency response classes (V-shaped, non-monotonic Vs, narrow, closed, tilt down, tilt up and double-peaked), that were represented at all frequencies. We could identify similar classes using our objective classification tools. Importantly, however, many neurons exhibited properties intermediate between these classes, and none of the objective methods used here showed evidence of discrete response classes. Thus receptive field shapes in the IC form continua rather than discrete classes, a finding consistent with the integration of afferent inputs in the generation of frequency response areas. The frequency disposition of inhibition in the response areas of some neurons suggests that across-frequency inputs originating at or below the level of the IC are involved in their generation

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure
    corecore